Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

Implement fast multi-node via an in-cluster proxy #106

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

dhiltgen
Copy link
Contributor

@dhiltgen dhiltgen commented Sep 29, 2021

Closes #26

This implements a proxy container inside the builder pod that can be used to handle image loading and transfer tasks at the end of a build.

I'm marking this WIP for the moment as it needs more documentation updates and CI work to publish the resulting image, but the main code should be ready for review. I would like to further improve/enhance native multi-arch support, but given we don't support that yet in the prior code, splitting that out as a follow on effort feels like the right approach (and this PR is already pretty massive)

version/version.go Outdated Show resolved Hide resolved
@dhiltgen dhiltgen force-pushed the fast_multinode_v2 branch 7 times, most recently from 0a69224 to 9335b53 Compare October 1, 2021 16:35
@dhiltgen
Copy link
Contributor Author

dhiltgen commented Oct 1, 2021

PR CI is now working.

Still needs post-merge CI (could come in a follow up PR possibly)

More docs are definitely a must-have before merging.

@dhiltgen dhiltgen force-pushed the fast_multinode_v2 branch 2 times, most recently from 29cd1fa to d3d4439 Compare November 19, 2021 20:29
@codecov-commenter
Copy link

codecov-commenter commented Nov 19, 2021

Codecov Report

Merging #106 (a611fb2) into main (c03728d) will increase coverage by 1.89%.
The diff coverage is 66.94%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #106      +/-   ##
==========================================
+ Coverage   65.13%   67.02%   +1.89%     
==========================================
  Files          40       49       +9     
  Lines        2825     4552    +1727     
==========================================
+ Hits         1840     3051    +1211     
- Misses        761     1272     +511     
- Partials      224      229       +5     
Flag Coverage Δ
integration-tests 47.62% <16.58%> (-14.33%) ⬇️
unit-tests 37.10% <59.76%> (+27.54%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/build/output.go 78.82% <0.00%> (-3.53%) ⬇️
pkg/cmd/create.go 70.96% <33.33%> (-7.83%) ⬇️
pkg/driver/kubernetes/creation.go 67.36% <38.46%> (+2.99%) ⬆️
pkg/build/build.go 44.84% <40.56%> (-3.43%) ⬇️
pkg/proxy/authproxy.go 45.45% <45.45%> (ø)
pkg/proxy/serverimageloader.go 51.87% <51.87%> (ø)
pkg/proxy/server.go 53.61% <53.61%> (ø)
pkg/proxy/dockerdloader.go 59.09% <59.09%> (ø)
cmd/buildkit-proxy/main.go 60.46% <60.46%> (ø)
pkg/proxy/containerdloader.go 61.11% <61.11%> (ø)
... and 52 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c03728d...a611fb2. Read the comment docs.

@dhiltgen dhiltgen force-pushed the fast_multinode_v2 branch 6 times, most recently from c793ee2 to ac53d2f Compare November 20, 2021 00:32
@dhiltgen dhiltgen changed the title WIP: Implement fast multi-node via an in-cluster proxy Implement fast multi-node via an in-cluster proxy Nov 20, 2021
@dhiltgen
Copy link
Contributor Author

dhiltgen commented Nov 20, 2021

Added more coverage and updated the docs to reflect the new design.

Once we wire up native multi-arch support (to follow) we should be able to get much higher coverage on build.go (there's some ~dormant code in there right now)

@dhiltgen dhiltgen force-pushed the fast_multinode_v2 branch 6 times, most recently from 621fe4d to 9fa9390 Compare January 4, 2022 18:01
@dhiltgen dhiltgen force-pushed the fast_multinode_v2 branch 3 times, most recently from 7b491af to 69bc95e Compare January 4, 2022 23:45
@dhiltgen
Copy link
Contributor Author

I wish the proxy implementation on this PR was simpler. I have some concerns this might turn into a maintenance challenge over time. What might make sense is to refine it a bit further so we have a "slow path fall-back" implemented where when/if the fast-path ever runs into problems or falls out of sync, users can still use the tool with the slow path until we update the proxy to match the new patterns.

Since this is already a patch-bomb, one possible thought is to put this on a feature branch, continue to iterate a bit more, maybe cut an RC build from that branch and let it mature a bit longer. (Open to suggestions)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve performance of multi-node scenarios
3 participants